Skip to content

feat : Add Support of Qwen2.5Omni Model and MiniCPM-o-4_5 Model#612

Open
KKkai0315 wants to merge 46 commits into
UbiquitousLearning:mainfrom
KKkai0315:main
Open

feat : Add Support of Qwen2.5Omni Model and MiniCPM-o-4_5 Model#612
KKkai0315 wants to merge 46 commits into
UbiquitousLearning:mainfrom
KKkai0315:main

Conversation

@KKkai0315
Copy link
Copy Markdown
Contributor

@KKkai0315 KKkai0315 commented Jan 23, 2026

  1. add support of Qwen2.5Omni Model (talker part still has some problem)
  2. now it can receive text, image and audio input, and generate text output
  3. add support of MiniCPM-o-4_5 Model

Summary by CodeRabbit

  • New Features

    • Full multimodal Qwen2.5 Omni: text, vision, audio support with tokenizer, audio preprocessing, comprehensive configs, and interactive example CLIs.
    • MiniCPM‑o4.5: multimodal model with TTS and end-to-end token→wav synthesis, prompt cache tools, tokenizers, and example CLIs.
  • Backends & Layers

    • CPU backend adds ConvTranspose1D and Tanh ops with corresponding NN layer support.
  • Tools

    • New scripts to convert token2wav formats and export prompt caches.
  • Tests

    • Kernel/unit tests for ConvTranspose1D and Tanh.

Loading
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants